77 research outputs found
Some intriguing properties of Tukey's half-space depth
For multivariate data, Tukey's half-space depth is one of the most popular
depth functions available in the literature. It is conceptually simple and
satisfies several desirable properties of depth functions. The Tukey median,
the multivariate median associated with the half-space depth, is also a
well-known measure of center for multivariate data with several interesting
properties. In this article, we derive and investigate some interesting
properties of half-space depth and its associated multivariate median. These
properties, some of which are counterintuitive, have important statistical
consequences in multivariate analysis. We also investigate a natural extension
of Tukey's half-space depth and the related median for probability
distributions on any Banach space (which may be finite- or
infinite-dimensional) and prove some results that demonstrate anomalous
behavior of half-space depth in infinite-dimensional spaces.Comment: Published in at http://dx.doi.org/10.3150/10-BEJ322 the Bernoulli
(http://isi.cbs.nl/bernoulli/) by the International Statistical
Institute/Bernoulli Society (http://isi.cbs.nl/BS/bshome.htm
Swords: a statistical tool for analysing large DNA sequences
In this article, we present some simple yet effective statistical techniques for analysing and comparing large DNA sequences. These techniques are based on frequency distributions of DNA words in a large sequence, and have been packaged into a software called swords. Using sequences available in public domain databases housed in the Internet, we demonstrate how swords can be conveniently used by molecular biologists and geneticists to unmask biologically important features hidden in large sequences and assess their statistical significance
On estimators of the mean of infinite dimensional data in finite populations
The Horvitz-Thompson (HT), the Rao-Hartley-Cochran (RHC) and the generalized
regression (GREG) estimators of the finite population mean are considered, when
the observations are from an infinite dimensional space. We compare these
estimators based on their asymptotic distributions under some commonly used
sampling designs and some superpopulations satisfying linear regression models.
We show that the GREG estimator is asymptotically at least as efficient as any
of the other two estimators under different sampling designs considered in this
paper. Further, we show that the use of some well known sampling designs
utilizing auxiliary information may have an adverse effect on the performance
of the GREG estimator, when the degree of heteroscedasticity present in linear
regression models is not very large. On the other hand, the use of those
sampling designs improves the performance of this estimator, when the degree of
heteroscedasticity present in linear regression models is large. We develop
methods for determining the degree of heteroscedasticity, which in turn
determines the choice of appropriate sampling design to be used with the GREG
estimator. We also investigate the consistency of the covariance operators of
the above estimators. We carry out some numerical studies using real and
synthetic data, and our theoretical results are supported by the results
obtained from those numerical studies
- …